Trie
的變形,將一個英文單字,從左邊開始一個一個移除後形成個別的後綴(Suffix)字,接著按照字典順序進行排序。
以 Banana 為例:
0 banana 5 a
1 anana 按照字典排序 3 ana
2 nana ------------> 1 anana
3 ana 0 banana
4 na 4 na
5 a 2 nana
=====> suffix array [5, 3, 1, 0, 4, 2]
這樣做的好處能夠提升單字搜尋的效率。
製作 Suffix Array
有兩種做法:
Index Suffix
0 banana
1 anana
2 nana
3 ana
4 na
5 a
Index Suffix Rank
0 banana 1
1 anana 0
2 nana 13
3 ana 0
4 na 13
5 a 0
Index Suffix Rank Next Rank
0 banana 1 0
1 anana 0 13
2 nana 13 0
3 ana 0 13
4 na 13 0
5 a 0 -1
Index Suffix Rank Next Rank
5 a 0 -1
1 anana 0 13
3 ana 0 13
0 banana 1 0
2 nana 13 0
4 na 13 0
Index Suffix Rank
5 a 0
1 anana 1
3 ana 1
0 banana 2
2 nana 3
4 na 33
Index Suffix Rank Next Rank
5 a 0 -1
1 anana 1 1
3 ana 1 0
0 banana 2 3
2 nana 3 3
4 na 3 -1
Index Suffix Rank Next Rank
5 a 0 -1
3 ana 1 0
1 anana 1 1
0 banana 2 3
4 na 3 -1
2 nana 3 3
沒錯,LeetCode 標籤上唯一標註為 Suffix Array
的題目只有一題,而且是 Hard。
Given a string s, return the last substring of s in lexicographical order.
Example 1:
Input: "abab"
Output: "bab"
Explanation: The substrings are ["a", "ab", "aba", "abab", "b", "ba", "bab"]. The lexicographically maximum substring is "bab".
Example 2:
Input: "leetcode"
Output: "tcode"
Note:
- 1 <= s.length <= 4 * 10^5
- s contains only lowercase English letters.
使用上述的做法,找出排序在最後的後綴字。
實際刷題才發現,上述演算法還需要再微調,當 s
的長度超過 50000 後,答案會不準。因此實際解題技巧與上述完全不同。
JS
/**
* @param {string} s
* @returns {string}
*/
const lastSubstring = (s) => {
if (s.length < 2) {
return s;
}
let leftSubstring = s.substring(0);
let rightSubstring = s.substring(1);
let head = 0;
while (rightSubstring[head]) {
while (rightSubstring[head] && leftSubstring[head] === rightSubstring[head]) {
head++;
}
if (rightSubstring[head]) {
if (leftSubstring[head] > rightSubstring[head]) {
rightSubstring = rightSubstring.substring(head + 1);
head = 0;
} else {
leftSubstring = rightSubstring;
rightSubstring = leftSubstring.substring(1);
head = 0;
}
}
}
return leftSubstring;
}
Java
class Solution {
public String lastSubstring(String word) {
int length = word.length();
if (length < 2) {
return word;
}
char[] sCharArr = word.toCharArray();
int left = 0;
int right = 1;
int head = 0;
while (right + head < length) {
while (right + head < length && sCharArr[left + head] == sCharArr[right + head]) {
head++;
}
if (right + head < length) {
if (sCharArr[left + head] > sCharArr[right + head]) {
right += (head + 1);
head = 0;
} else {
left = right;
right = left + 1;
head = 0;
}
}
}
return word.substring(left);
}
}
C
char *lastSubstring(char *s)
{
if (s[0] == '\0' || s[1] == '\0')
{
return s;
}
char *leftSubstring = s;
char *rightSubstring = s + 1;
int head = 0;
while (rightSubstring[head])
{
while (rightSubstring[head] && leftSubstring[head] == rightSubstring[head])
{
head++;
}
if (rightSubstring[head])
{
if (leftSubstring[head] > rightSubstring[head])
{
rightSubstring += head + 1;
head = 0;
}
else
{
leftSubstring = rightSubstring;
rightSubstring = leftSubstring + 1;
head = 0;
}
}
}
return leftSubstring;
}
實際解題的想法是這樣:
leftSubstring
與 rightSubstring
head
。leftSubstring
大於 rightSubstring
,那 rightSubstring
往後挪一個字元,形成新的後綴字。leftSubstring
,同時 rightSubstring
往後挪一個字元,形成新的後綴字。rightSubstring
的長度為 0。leftSubstring
就是答案。深深體會到實際理論要應用時,有時情況不是理論般的美好,許多細節要注意,到最後可能理論不使用了,乾脆一個一個比較還比較快速。
當然,也體會到自己學的不夠專精,遇到突發狀況無法修改理論,只好用老方法解題。
Pattern Searching using Suffix Tree
Suffix Array | Set 1 (Introduction)
Suffix Array | Set 2 (nLogn Algorithm)